Model-driven Scheduling for Distributed Stream Processing Systems

نویسندگان

  • Anshu Shukla
  • Yogesh L. Simmhan
چکیده

Distributed Stream Processing frameworks are being commonly used with the evolution of Internet of Things(IoT). These frameworks are designed to adapt to the dynamic input message rate by scaling in/out.Apache Storm, originally developed by Twitter is a widely used stream processing engine while others includes Flink [8] Spark streaming [73]. For running the streaming applications successfully there is need to know the optimal resource requirement, as over-estimation of resources adds extra cost.So we need some strategy to come up with the optimal resource requirement for a given streaming application. In this article, we propose a model-driven approach for scheduling streaming applications that effectively utilizes a priori knowledge of the applications to provide predictable scheduling behavior. Specifically, we use application performance models to offer reliable estimates of the resource allocation required. Further, this intuition also drives resource mapping, and helps narrow the estimated and actual dataflow performance and resource utilization. Together, this model-driven scheduling approach gives a predictable application performance and resource utilization behavior for executing a given DSPS application at a target input stream rate on distributed resources.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Network-Aware Workload Scheduling for Scalable Linked Data Stream Processing

In order to cope with the ever-increasing data volume, distributed stream processing systems have been proposed. To ensure scalability most distributed systems partition the data and distribute the workload among multiple machines. This approach does, however, raise the question how the data and the workload should be partitioned and distributed. A uniform scheduling strategy—a uniform distribu...

متن کامل

Scalable Linked Data Stream Processing via Network-Aware Workload Scheduling

In order to cope with the ever-increasing data volume, distributed stream processing systems have been proposed. To ensure scalability most distributed systems partition the data and distribute the workload among multiple machines. This approach does, however, raise the question how the data and the workload should be partitioned and distributed. A uniform scheduling strategy—a uniform distribu...

متن کامل

Massively parallel execution of logic programs: A static approach

A static model for the parallel execution of logic programs on MIMD distributed memory systems is presented where a refutation is implemented through a process network returned by the compilation of the logic program. The model supports Restricted-AND, OR and stream parallelism and it is integrated with a set of static analyses to optimise the process network. Altogether, the processes interact...

متن کامل

Programmable scheduling in a stream processing

The need for frameworks to express distributed computation in a safe and reliable manner has, in the recent years, resulted in renewed interest for the dataflow programming model, which represents a program as a graph of interconnected operators that perform data transformations. Many research-oriented and industry-grade systems have employed this model to describe ”streaming” transformations a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1702.01785  شماره 

صفحات  -

تاریخ انتشار 2017